In recent years, object detection has achieved a very large performance improvement, but the detection result of small objects is still not very satisfactory. This work proposes a strategy based on feature fusion and dilated convolution that employs dilated convolution to broaden the receptive field of feature maps at various scales in order to address this issue. On the one hand, it can improve the detection accuracy of larger objects. On the other hand, it provides more contextual information for small objects, which is beneficial to improving the detection accuracy of small objects. The shallow semantic information of small objects is obtained by filtering out the noise in the feature map, and the feature information of more small objects is preserved by using multi-scale fusion feature module and attention mechanism. The fusion of these shallow feature information and deep semantic information can generate richer feature maps for small object detection. Experiments show that this method can have higher accuracy than the traditional YOLOv3 network in the detection of small objects and occluded objects. In addition, we achieve 32.8\% Mean Average Precision on the detection of small objects on MS COCO2017 test set. For 640*640 input, this method has 88.76\% mAP on the PASCAL VOC2012 dataset.
translated by 谷歌翻译
Dunhuang murals are a collection of Chinese style and national style, forming a self-contained Chinese-style Buddhist art. It has very high historical and cultural value and research significance. Among them, the lines of Dunhuang murals are highly general and expressive. It reflects the character's distinctive character and complex inner emotions. Therefore, the outline drawing of murals is of great significance to the research of Dunhuang Culture. The contour generation of Dunhuang murals belongs to image edge detection, which is an important branch of computer vision, aims to extract salient contour information in images. Although convolution-based deep learning networks have achieved good results in image edge extraction by exploring the contextual and semantic features of images. However, with the enlargement of the receptive field, some local detail information is lost. This makes it impossible for them to generate reasonable outline drawings of murals. In this paper, we propose a novel edge detector based on self-attention combined with convolution to generate line drawings of Dunhuang murals. Compared with existing edge detection methods, firstly, a new residual self-attention and convolution mixed module (Ramix) is proposed to fuse local and global features in feature maps. Secondly, a novel densely connected backbone extraction network is designed to efficiently propagate rich edge feature information from shallow layers into deep layers. Compared with existing methods, it is shown on different public datasets that our method is able to generate sharper and richer edge maps. In addition, testing on the Dunhuang mural dataset shows that our method can achieve very competitive performance.
translated by 谷歌翻译
Federated learning (FL) enables distributed model training from local data collected by users. In distributed systems with constrained resources and potentially high dynamics, e.g., mobile edge networks, the efficiency of FL is an important problem. Existing works have separately considered different configurations to make FL more efficient, such as infrequent transmission of model updates, client subsampling, and compression of update vectors. However, an important open problem is how to jointly apply and tune these control knobs in a single FL algorithm, to achieve the best performance by allowing a high degree of freedom in control decisions. In this paper, we address this problem and propose FlexFL - an FL algorithm with multiple options that can be adjusted flexibly. Our FlexFL algorithm allows both arbitrary rates of local computation at clients and arbitrary amounts of communication between clients and the server, making both the computation and communication resource consumption adjustable. We prove a convergence upper bound of this algorithm. Based on this result, we further propose a stochastic optimization formulation and algorithm to determine the control decisions that (approximately) minimize the convergence bound, while conforming to constraints related to resource consumption. The advantage of our approach is also verified using experiments.
translated by 谷歌翻译
自动图像分割技术对于视觉分析至关重要。自动编码器体系结构在各种图像分割任务中具有令人满意的性能。但是,基于卷积神经网络(CNN)的自动编码器似乎在提高语义分割的准确性方面遇到了瓶颈。增加前景和背景之间的类间距离是分割网络的固有特征。但是,分割网络过于关注前景和背景之间的主要视觉差异,而忽略了详细的边缘信息,从而导致边缘分割的准确性降低。在本文中,我们提出了一个基于多任务学习的轻量级端到端细分框架,称为Edge Coasity AutoCododer Network(EAA-NET),以提高边缘细分能力。我们的方法不仅利用分割网络来获得类间特征,而且还采用重建网络来提取前景中的类内特征。我们进一步设计了一个阶层和类间特征融合模块-I2融合模块。 I2融合模块用于合并课内和类间特征,并使用软注意机制去除无效的背景信息。实验结果表明,我们的方法在医疗图像分割任务中的表现良好。 EAA-NET易于实现,并且计算成本较小。
translated by 谷歌翻译
主动扬声器检测在人机相互作用中起着至关重要的作用。最近,出现了一些端到端的视听框架。但是,这些模型的推理时间没有被探索,并且由于其复杂性和较大的输入大小而不适用于实时应用。此外,他们探索了类似的功能提取策略,该策略在音频和视觉输入中采用了Convnet。这项工作提出了一种新型的两流端到端框架融合,通过VGG-M从图像中提取的特征与原始MEL频率Cepstrum系数从音频波形提取。该网络在每个流上附有两个BigRu层,以处理融合之前每个流的时间动态。融合后,将一个BigRU层附着在建模联合时间动力学上。 AVA-ACTIVESPEAKER数据集的实验结果表明,我们的新功能提取策略对嘈杂信号的鲁棒性和推理时间比在这两种模式上使用Convnet的模型更好。提出的模型预测44.41 ms之内,足够快地用于实时应用程序。我们表现​​最佳的模型获得了88.929%的精度,与最先进的工作相同。
translated by 谷歌翻译
服务机器人安全有礼貌的机器人需要坚强地跟踪周围人,尤其是对于旅游指南机器人(TGR)。但是,由于以下原因,现有的多对象跟踪(MOT)或多人跟踪(MPT)方法不适用于TGR:1。缺乏相关的大型数据集;2.缺少适用的指标来评估跟踪器。在这项工作中,我们针对TGR的视觉感知任务,并介绍TGRDB数据集,TGRDB数据集是一种新颖的大型多人跟踪数据集,其中包含大约5.6小时的带注释视频和超过450个长期轨迹。此外,我们提出了一个更适合使用数据集评估跟踪器的指标。作为我们工作的一部分,我们提出了TGRMPT,这是一种新型的MPT系统,它结合了头部肩膀和全身的信息,并实现了最先进的性能。我们已经在https://github.com/wenwenzju/tgrmpt中发布了代码和数据集。
translated by 谷歌翻译
我们提出了压缩的垂直联合学习(C-VFL),以在垂直分区的数据上进行沟通效率培训。在C-VFL中,服务器和多方在使用多个本地迭代并定期共享压缩的中间结果的服务器和多方在其各自的功能上进行协作。我们的工作提供了有关效果消息压缩对分布式培训对垂直分区数据的分布培训的首次理论分析。我们以$ O(\ frac {1} {\ sqrt {t}}} $的速率证明非凸目标的收敛性。我们提供了与通用压缩技术(例如量化和顶部$ k $稀疏)的融合的特定要求。最后,我们通过实验表明,压缩可以减少$ 90 \%$的交流,而不会显着降低VFL的准确性而没有压缩。
translated by 谷歌翻译
图形内核是历史上最广泛使用的图形分类任务的技术。然而,由于图的手工制作的组合特征,这些方法具有有限的性能。近年来,由于其性能卓越,图形神经网络(GNNS)已成为与下游图形相关任务的最先进的方法。大多数GNN基于消息传递神经网络(MPNN)框架。然而,最近的研究表明,MPNN不能超过Weisfeiler-Lehman(WL)算法在图形同构术中的力量。为了解决现有图形内核和GNN方法的限制,在本文中,我们提出了一种新的GNN框架,称为\ Texit {内核图形神经网络}(Kernnns),该框架将图形内核集成到GNN的消息传递过程中。通过卷积神经网络(CNNS)中的卷积滤波器的启发,KERGNNS采用可训练的隐藏图作为绘图过滤器,该绘图过滤器与子图组合以使用图形内核更新节点嵌入式。此外,我们表明MPNN可以被视为Kergnns的特殊情况。我们将Kergnns应用于多个与图形相关的任务,并使用交叉验证来与基准进行公平比较。我们表明,与现有的现有方法相比,我们的方法达到了竞争性能,证明了增加GNN的表现能力的可能性。我们还表明,KERGNNS中的训练有素的图形过滤器可以揭示数据集的本地图形结构,与传统GNN模型相比,显着提高了模型解释性。
translated by 谷歌翻译
联合学习(FL)算法通常在每个圆数(部分参与)大并且服务器的通信带宽有限时对每个轮子(部分参与)进行分数。近期对FL的收敛分析的作品专注于无偏见的客户采样,例如,随机均匀地采样,由于高度的系统异质性和统计异质性而均匀地采样。本文旨在设计一种自适应客户采样算法,可以解决系统和统计异质性,以最小化壁时钟收敛时间。我们获得了具有任意客户端采样概率的流动算法的新的遗传融合。基于界限,我们分析了建立了总学习时间和采样概率之间的关系,这导致了用于训练时间最小化的非凸优化问题。我们设计一种高效的算法来学习收敛绑定中未知参数,并开发低复杂性算法以大致解决非凸面问题。硬件原型和仿真的实验结果表明,与几个基线采样方案相比,我们所提出的采样方案显着降低了收敛时间。值得注意的是,我们的硬件原型的方案比均匀的采样基线花费73%,以达到相同的目标损失。
translated by 谷歌翻译
我们考虑在分层通信网络中的联合学习。我们的网络模型由一组孤岛组成,每个孤立者都包含数据的垂直分区。每个孤岛都包含一个集线器和一组客户端,筒仓的垂直数据碎片在其客户端跨越水平分区。我们提出了分层分散的坐标血统(TDCD),这是一种用于这种双层网络的通信有效的分散训练算法。每个筒仓中的客户端在与他们的集线器共享更新之前执行多个本地梯度步骤以减少通信开销。每个集线器通过平均其工作人员更新来调整其坐标,然后集线器交换中间更新彼此。我们对我们的算法提供了一个理论分析,并显示了收敛速度对垂直分区数量的依赖性和本地更新的数量。我们通过使用各种数据集和目标,通过基于模拟的实验进行了经验验证我们的方法。
translated by 谷歌翻译